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Abstract. The problem of answering queries posed to a peer who is a 
member of a peer-to-peer data exchange system is studied. The answers 
have to be consistent wrt to both the local semantic constraints and 
the data exchange constraints with other peers; and must also respect 
certain trust relationships between peers. A semantics for peer consistent 
answers under exchange constraints and trust relationships is introduced 
and some techniques for obtaining those answers are presented. 

1 Introduction 

In this paper the problem of answering queries posed to a peer who is a member 
of a peer-to-peer data exchange system is investigated. When a peer P receives a 
query and is going to answer it, it may need to consider both its own data and 
the data stored at other peers' sites if those other peers are related to P by data 
exchange constraints (DECs). Keeping the exchange constraints satisfied, may 
imply for peer P to get data from other peers to complement its own data, but 
also not to use part of its own data. In which direction P goes depends not only 
on the exchange constraints, but also on the trust relationships that P has with 
other peers. For example, if P trust another peer Q's data more than its own, P 
will accommodate its data to Q's data in order to keep the exchange constraints 
satisfied. Another element to take into account in this process is a possible set 
of local semantic constraints that each individual peer may have. 

Given a network of peers, each with its own data, and a particular peer P in it, 
a solution for P is -loosely speaking- a global database instance that respects the 
exchange constraints and trust relationships P has with its immediate neighbors 
and stays as close as possible to the available data in the system. Since the 
answers from P have to be consistent wrt to both the local semantic constraints 
and the data exchange constraints with other peers, the peer consistent answers 
(PCAs) from P are defined as those answers that can be retrieved from P's portion 
of data in every possible solution for P. This definition may suggest that P may 
change other peers' data, specially of those he considers less reliable, but this is 
not the case. The notion of solution is used as an auxiliary notion to characterize 
the correct answers from P's point of view. Ideally, P should be able to obtain its 
peer consistent answers just by querying the already available local instances. 
This resembles the approach to consistent query answering (CQA) in databases 
| ll8j . where answers to queries that are consistent with given ICs are computed 
without changing the original database. 

We give a precise semantics for peer consistent answers to first-order queries. 
First for the direct case, where transitive relationships between peers via ECs 



are not automatically considered; and at the end, the transitive case. We also 
illustrate by means of extended and representative examples, mechanisms for 
obtaining PCAs (a full treatment is left for an extended version of this paper). 
One of the approaches is first order (FO) query rewriting, where the original 
query is transformed into a new query, whose standard answers are the PCAs to 
the original one. This methodology has intrinsic limitations. The second, more 
general, approach is based on a specification of the solutions for a peer as the 
stable models of a logic program, which captures the different ways the system 
stabilizes after making the DECs and the trust relationships to be satisfied. 

We first recall the definition of database repair that is used to characterize 
the consistent answers to queries in single relational databases wrt certain in- 
tegrity constraints (ICs) 1 . Given a relational database instance r with schema 
TZ (which includes a domain D), S{r) is the set of ground atomic formulas 
{F(a) I P e 7^ and r h P(a)}. 

Definition 1. (a) Let ri,r2 be database instances over TZ. The distance, 
^(''ii''2), between ri and r2 is the symmetric difference A{ri,r2) ~ {Z!{ri) \ 
r(r2)) U iE{r2) \ 

(b) For database instances r, ri, r2, we define ri <r r2 if A{r, ri) C A{r,r2). 

(c) Let IC be a set of ICs on TZ. A repair of an instance r wrt IC is a <r-niinimal 
instance r', such that r' \= IC. □ 
A repair of an instance r is a consistent instance that minimally differs from r. 

2 A Framework for P2P Data Exchange 

In this section we will describe the framework we will use to formalize and 
address the problem of query answering in P2P systems. 

Definition 2. A P2P data exchange system *p consists of: 

(a) A finite set V of peers, denoted by A, B, ... 

(b) For each peer P, a database schema 7^(P), that includes a domain D{P), and 
relations i?(P), .... However, it may be natural and convenient to assume that all 
peers share a common, fixed, possibly infinite domain, D. Each TZ{P) determines 
a FO language £(P). We assume that the schemata TZ{P) are disjoint, being the 
domains the only possible exception. TZ denotes the union of the TZ{P)s. 

(c) For each peer P, a database instance r(P) corresponding to schema TZ{P). 

(d) For each peer P, a set of £(P)-sentences /C(P) of ICs on TZ{'P). 

(e) For each peer P, a collection of data exchange constraints Q) con- 
sisting of sentences written in the FO language for the signature TZ{P) U 7^(Q), 
and the Q's are (some of the) other peers in V. 

(f) A trust relation trust C ■px{/ess, same}xV, with the intended semantics that 
when (A, less, B) G trust, peer A trusts itself less than B; while (A, same, B) G trust 
indicates that A trusts itself the same as B. In this relation, the second argument 
functionally depends on the other two. □ 
Each peer P is responsible for the update and maintenance of its instance wrt 
/C(P), independently from other peers. In particular, we assume r(P) ^ /C(P).^ 

^ It would not be difficult to extend this scenario to one that allows local violations of 
ICs. Techniques as those described in ^ could be used in this direction. 



Peers may submit queries to another peer in accordance with the restrictions 
imposed by the DECs and using the other peer's relations appearing in them. 
Definition 3. (a) We denote with 7^(P) the schema consisting of TZ{P) extended 
with the other peers' schemas that contain predicates appearing in S{P). 

(b) For a peer P and an instance r on TZ(P), we denote by f , the database instance 
on 7^(P), consisting of the union of r with aU the peers' instances whose schemas 
appear in 7^(P). 

(c) If r is an instance over a certain schema S and S' is a subschema of S, 
then r|iS' denotes the restriction of r to S' . In particular, if TZ{P) C S, then r|P 
denotes the restriction of r to 7^(P). 

(d) We denote by 7^(P)'•^"" the union of all schemata 7^(Q), with (P,/ess,Q) G 
trust. Analogously is 7?.(P)-'°""^ defined. □ 

From the perspective of a peer P, its own database may be inconsistent wrt the 
data owned by another peer Q and the DECs in Z'(P,Q). Only when P trust Q 
the same as or more than itself, it has to consider Q's data. When P queries its 
database, these inconsistencies may have to be taken into account. Ideally, the 
answers to the query obtained from P should be consistent with Z'(P, Q) (and its 
own ICs ^(P)). In principle, P, who is not allowed to change other peers' data, 
could try to repair its database in order to satisfy -£'(P) U /C(P). This is not 
a realistic approach. Rather P should solve its conflicts at query time, when it 
queries its own database and those of other peers. Any answer obtained in this 
way should be sanctioned as correct wrt to a precise semantics. 

The semantics of peer consistent query answers for a peer P is given in terms 
of all possible minimal, virtual, simultaneous repairs of the local databases that 
lead to a satisfaction of the DECs while respecting P's trust relationships to 
other peers. This repair process may lead to alternative global databases called 
the solutions for P. Next, the peer consistent answers from P are those that 
are invariant wrt to all its solutions. A peer's solution captures the idea that 
only some peers' databases are relevant to P, those whose relations appear in its 
trusted exchange constraints, and are trusted by P at least as much as it trusts 
its own data. In this sense, this is a "local notion" , because it does not take into 
consideration transitive dependencies (but see Section IT5t . 
Definition 4. (direct case) Given a peer P in a P2P data exchange system *P 
and an instance r on TZ, we say that an instance r' on 7?. is a solution for P 
if, simultaneously: (a) r' |= 2J{P) U IC{P). (b) r'\P ^ r\P for every predicate 
P ^ ^(P)- (c) There are instances ri,r2 over TZ satisfying: (cl) r2 = r' . 
(c2) ri is a repair of r wrt IJ{Z'(P, Q) | (P, Zess, Q) G trust}, with ri|Q = r\Q 
whenever (P, less, Q) G trust or (P, same, Q) G trust. (c3) r2 is a repair of ri wrt 
\J{S{P,Q) I {P,same,Q) G trust}, such that r2 ^ ^(P, Q) and r2|Q = ri|q for 
those peers Q with (P, less,Q) G trust. □ 

The solutions for a peer are used as a conceptual, auxiliary tool to characterize 
the semantically correct answers to a peer's queries. We are not interested in 
computing a peer's solutions per se. Solutions (and repairs) are virtual and 
may be only partially computed if necessary, if this helps us to compute the 
correct answers obtained in/from a peer. The "changes" that are implicit in the 



definition of solution via the set difi^erences are expected to be minimal wrt to 
sets of tuples which are inserted/deleted into/from the tables. 

In intuitive terms, a solution for P repairs the global instance, but leaves 
unchanged the tables that do not appear in its trusted ICs and those tables that 
belong to peers that are more trusted by him than himself. With this condition, 
P first tries to change its own tables according to what the dependencies to more 
trusted peers of peers prescribe. Next, keeping those more trusted dependencies 
satisfied, it tries to repair its or other peers' data, but only considering those 
peers who are equality trusted as itself. 

In these definitions we find clear similarities with the characterization of 
consistent query answers in single relational databases |5]. However, in P2P 
query answering, repairs may involve data associated to different peers, and 
also a notion of priority that is related to the trust relation (other important 
differences are discussed below). 

Example 1. Consider a P2P data system with peers PI, P2, P3, and schemas 
TZi — {W, . . .}, and instances r% i = 1,2, 3, resp.; and: (a) = {R^{a, b),R^{s, 
t)}, r"^ = {R\c,d),R^{a,e)}, ^ {R^{a, f), R^{s,u)}. (h) trust ^ { (Pi, less, 
P2), (PI, same, P3) }. (c) S{P1,P2) ^ {yxy{RHx,y) ^ RHx,y)) }; i:(Pl,P3) 
= {yxyziR^{x,y)AR^{x,z) -> y = z) }. 

Here, the global instance is r = {R^{a, b),R^{s, t),R^{c, d),R^{a, e), R^{a, /), 
i?^(s,w)}. The solutions for PI are obtained by first repairing r wrt the re- 
lationship between PI and P2. Then ri in condition (c2) in Definition ^ is 
n = {R^{a,h),R^{s,t), R^{c,d),R^{a,e), R^{c,d),R^{a,e), R^{a, f),R^{s,u)}. 
In this example there is only one repair at this stage, but in other situations there 
might be several. Now, this repair has to be repaired in its turn wrt the data 
dependency between PI and P3 (but keeping the relationship between PI and P2 
satisfied). In this case, we obtain only two repairs, r' = {R^{a, h), R^{s, t), R^{c, d), 
R\a, e), i?2(c, d), R^{a, e)}; and r" = { R^{a, b), R\c, d),R^{a, e), R^ic, d), R^{a, 
e), R^{s, u)}. These are the only solutions for peer PI. □ 

The minimization involved in a solution is similar to a prioritized minimization 
(with some predicates that are kept fixed) found in non-monotonic reasoning 
j25| . Actually, the notion of consistent query answer -even the one based on the 
non prioritized version of repair (c.f. Definition is a non-monotonic notion 

w 

Notice that the notion of a solution for a peer P is a "local notion" in the sense 
that it considers the "direct neighbors" of P only. One reason for considering this 
case is that P does not see beyond its neighbors; and when P requests data to 
a neighbor, say Q, the latter may decide -or even P may decide a priori and in 
a uniform way- that for P it is good enough to accommodate its data to its 
neighbors alone, without considering any transitive dependencies. In section [4. 31 
we will explore the case of interrelated dependencies. 

Now we can define which are the intended answers to a query posed to a 
peer, from the perspective of that peer. 

^ A circumscriptive approach to database repairs was given in It should not be 
difficult to extend that characterization to capture the peer solutions. 



Definition 5. Given a FO query Q{x) E >C(P), posed to peer P, a ground tuple 
t is peer consistent for P iS r'\P \= Q{t) for every solution r' for P. □ 

Notice that this definition is relative to a fixed peer, and not only because the 
query is posed to one peer and in its query language, but also because this notion 
is based on the direct notion of solution for a single peer. 

Peer consistent answers to queries can be obtained by using techniques sim- 
ilar to those developed for CQA, for example, query rewriting based techniques 
[1181. However, there are important differences, because now we have some fixed 
predicates in the repair process. 

Example 2. (example^continued) If PI is posed the query Q : (x, y) asking for 
the tuples in relation , we first rewrite the query by considering the exchange 
dependencies in Z'(P1,P2), obtaining Q' : R^{x,y) V R^{x,y), which basically 
has the effect of bringing P2's data into PI. Next, the exchange dependency 
i7(Pl,P3) is considered, and now the query is rewritten into 

Q":[R\x,y)A'izi{R^{x,zi)A^3z2R^{x,Z2) ^ zi=y)] V R^{x,y). (1) 

In order to answer this query, PI will first issue a query to P2 to retrieve the 
tuples in R^\ next, a query is issued to P3 to leave outside R^ those tuples that 
appear with the same first but not the same second argument in i?^, as long as 
the confiicting tuple in R^ is "protected" by a tuple in R^ which has the same 
key as a the two conflicting tuples in R^ and R^ {R^{a, b) above). The answers 
to query ^ are (a, b), (c, d), (a, e), precisely the peer consistent answers to query 
Q for peer PI according to their semantic definition. □ 

Notice that a query Q may have peer consistent answers for a peer which are 
not answers to Q when the peer is considered in isolation. This makes sense, 
because the peer may import data from other peers. This is another difference 
with CQA, where all consistent answers are answers to the original query'^. 

The query rewriting approach suggested in Example |2] differs from the one 
used for CQA. In the latter case, literals in the query are resolved (using reso- 
lution) against the ICs in order to generate residues that are appended as extra 
conditions to the query, in an iterative process. In the case of P2P data systems, 
the query may have to be modified in order to include new data that is located 
at a different peer's site. This cannot be achieved by imposing extra conditions 
alone -as in the query rewriting based consistent query answering- but instead, 
by relaxing the query in some sense. 

Instead of pursuing and fully developing a FO query transformation approach 
to query answering in P2P systems, we will propose (see SectionlSJ an alternative 
methodology based on answer set programming, which is more general. Further- 
more, since query answering in P2P systems already includes some sufficiently 
complex cases of CQA, a FO query rewriting approach to P2P query answering 
is bound to have important limitations in terms of completeness, as in CQA ,8^; 
for example in the case of existential queries and/or existential DECs. 
^ At least if the ICs are generic i.e. they do not imply by themselves the pres- 
ence/absence of any particular ground tuple in/from the database. 



3 Referential Exchange Constraints 

In most applications wc may expect the exchange constraints to be inclusion 
dependencies or referential constraints, i.e. formulas of the form 

yx3y{R^{x)A--- ^i?P(z,y) A---), (2) 

where , i? are relations for peers Q and P, resp., the dots indicate some 
possible additional conditions, most likely expressed in terms of built-ins, z Q x 
(if y = and z — x, and no additional conditions are given, we have a full 
inclusion dependency, like Z'(P1,P2) in Example^. 

An exchange constraint of the form ^ will most likely belong to i^(P, Q), i.e. 
to peer P, who wants to import data from the more trustable peer Q. It could 
also belong to Q, if this peer wants to validate its own data against the data at 
P's site. Section |0] shows an example of a more involved referential constraint. 

An answer set programming approach to the specification of solutions for a 
peer can be developed. In spirit, those specifications would be similar to those of 
repairs of single relational databases under referential integrity constraints [3]. 
However -as already seen in Examples ^ and El- there are important differences 
with CQA. In Section f^.ll we give an example that shows the main issues around 
this kind specification.* 

3.1 An extended example 

Consider a P2P data exchange system with peers P and Q, with schemas 
•), i?2(-, ■)}, ■), S2{-, ■)}, resp. Peer P also has the exchange constraint 

yxVyVz3'w{Ri{x,y) A Si{z,y) —> R2{x,w) A S2{z,w)), (3) 

which mixes tables of the two peers on each side of the implication. 

Let us assume that peer P is querying his database, but subject to its DEC 
(jnj. We will consider the case where (P, less, Q) S trust, i.e. P considers Q's data 
more reliable than his own. If Q is satisfied by the combination of the data 
in P and Q, then the current global instance constitutes P's solution. Otherwise, 
alternative solutions for P have to be found, keeping Q's data fixed in the process. 
This is the case, where there are ground tuples Ri{d, m) £ ^(P), S2{a, m) e r(Q), 
such that for no t it holds both R2{d,t) e r(P) and S2{a,t) G r(Q). 

Obtaining peer consistent answers to queries for peer P amounts to virtually 
restoring the satisfaction of ||2J), actually by virtually modifying P's data. In order 
to specify P's modified relations, we introduce virtual versions R^-^ , R2 of R\ , R2 , 
which will contain the data in peer P's solutions. In consequence, at the solution 
level, we have the relations R'l, R'2, Si, 82- Since P is querying its database, its 
original queries will be expressed in terms of relations R'l , R'2 only (plus possible 
built-ins). 

The contents of the virtual relations R'l , i?2 will be obtained from the contents 
of the material sources i?2, Si, 82-^ Since Si,S2 are fixed, the satisfaction of 

* A detailed and complete approach will be found in an extended version of this paper. 

^ We can observe that the virtual relations can be seen as virtual global relations in 
a virtual data integration system |24I21I . For a more detailed comparison between 
data integration and peer data management systems see |19l2fci| . 



(|2J) requires R[ to be a subset of and R2, a superset of i?2- The specification 
of these relations can be done in disjunctive extended logic programs with answer 
set (stable model) semantics U^l- The first rules for the specification program 

^ R[{x,y) ^ Ri{x,y), not ^R[{x,y) (4) 

R2{x,y) ^ R2{x,y), not ^R'2{x,y), (5) 

which specify that, by default, the tuples in the source relations are copied into 
the new virtual versions, but with the exception of those that may have to be 
removed in order to satisfy (|3J) (with i?i,i?2 replaced by R'^^R^). Some of the 



exceptions for R'^ are specified by 

-iR[{x,y) ^ Ri{x,y), Si{z,y), not auxi{x, z), not aux2{z) (6) 

auxi{x, z) ^ R2{x,w), S2{z,w) (7) 

aux2{z) ^ S2{z,w). (8) 



That is, Ri (x, y) is deleted if it participates in a violation of (what is captured 
by the first three literals in the body of © plus rule |7J), and there is no way 
to restore consistency by inserting a tuple into i?2, because there is no possible 
matching tuple in ^2 for the possibly new tuple in R2 (what is captured by the 
last literal in the body of © plus rule 0). In case there is such a tuple in 52, 
then we have the alternative of either deleting a tuple from i?i or inserting a 
tuple into R2: 

-^R[{x, y) V i?2(^7 ^ Ri{x, y), Si{z, y), not auxi{x, z), S2{z, w), 

choice {{x, z),w). (9) 

That is, in case of a violation of when there is tuple of the form {a,t) 
in 5*2 for the combination of values {d,a), then the choice operator 17 non 
deterministically chooses a unique value for t, so that the tuple (d, t) is inserted 
into i?2 as an alternative to deleting (d, to) from Ri . Notice that no exceptions are 
specified for R'2, what makes sense since R'2 is a superset of i?2- In consequence, 
the negative literal in the body of ^ can be eliminated. However, new tuples 
can be inserted into R2, what is captured by rule Q- Finally, the program must 
contain as facts the tuples in the original relations Ri, R2, Si^ 82- 

In the case where P equally trusts himself and Q, both P and Qs' relations 
become flexible when searching for a solution for P. The program becomes more 
involved, because now ^i, S2 may also change. In consequence, virtual versions 
for them should be introduced and specified. 

3.2 Considerations on specifications of peers' solutions 

The example we presented in Section [3.11 shows the main issues in the specifica- 
tion of a peer's solutions under referential exchange constraints. If desired, the 
choice operator can be replaced by a predicate that can be defined by means 
of extra rules, producing the so-called stable version of the choice program jl7| . 
This stable version has a completely standard answer set semantics. 

The peer's solutions are in one to one correspondence with the answer sets of 
the program. In the previous example, each solution r^' for peer P coincides with 



the original, material, global instance for the tables other than _Ri,i?2, whereas 
the contents rf , rf for these two are of the form rf = {t \ R[{i) G S*}, where S 
is an answer set of program U . The absence of solutions for a peer will thus be 
captured by the non existence of answer sets for program 77. 

Program U represents in a compact form all the solutions for a peer; in 
consequence, the peer consistent answers to a query posed to the peer can be 
obtained by running the query, expressed as a query program in terms of the 
virtually repaired tables, in combination with the specification program 77. The 
answers so obtained will be those that hold for all the possible solutions if the 
program is run under the skeptical answer set semantics. As for consistent query 
answering, a system like DLV |E] can be used for this purpose. 

For example, the query Q{x,z) : 3y{Ri{x,y) A R2{z,y)) issued to peer P, 
would be peer consistently answered by running the query program Ansq {x, z) ^ 
R'i{x, y), 7?2(a;, y) together with program 77. Although only (the new versions of) 
P's relations appear in the query, the program may make P import data from Q. 

If a peer has local ICs that have to be satisfied and a program has been used to 
specify its solutions, then the program should take care of those constraints. One 
simple way of doing this consists in using program denial constraints. If in Section 
13.11 we had for peer P the local functional dependency (FD) Va;Vi/Vz(7?i(a;, y) A 
Ri{x,z) y = z), then program would include the program constraint «— 
Ri{x,y), R{x, z),y ^ z, which would have the effect of pruning those solutions 
(or models of the program) that do not satisfy the FD. DLV, for example, can 
handle program denial constraints 221 . 

A more flexible alternative to keeping the local ICs satisfied, consists in hav- 
ing the specification program split in two layers, where the first one builds the 
solutions, without considering the local ICs, and the second one, repairs the so- 
lutions wrt the local ICs, as done with single inconsistent relational databases 

IS! 

Finally, we should notice that obtaining peer consistent answers has at least 
the data complexity of consistent query answering, for which some results are 
known [12115111] . In the latter case, for common database queries and ICs, 771^- 
completeness is easily achieved. On the other side, the problem of skeptical query 
evaluation from the disjunctive programs we are using for P2P systems is also 
77|'-complete in data complexity |13| . In this sense, the logic programs are not 
contributing with additional complexity to our problem. 

4 Discussion and Extensions 
4.1 Optimizations 

It is possible to perform some optimizations on the program, to make its evalu- 
ation simpler. Disjunctive program under the stable model semantics are more 
complex than non disjunctive programs |13| . However, it is known that a disjunc- 
tive program can be transformed into a non disjunctive program if the program is 
head-cycle free (HCF) [412 2| . Intuitively speaking, a disjunctive program is HCF 
if there are no cycles involving two literals in the head of a same rule, where a 
link is established from a literal to another if the former appears positive in the 



body of a rule, and the latter appears in the head of the same rule. These consid- 
erations about HCF programs hold for programs that do not contain the choice 
operator, i.e. they might not automatically apply to our programs that specifies 
the solutions for a peer under referential constraints. However, it is possible to 
prove that a disjunctive choice program U is HCF when the program obtained 
from n by removing its choice goals is HCF. 6 . 

Example 3. Consider the choice program 7T presented in Section lTTl If the choice 
operator is eliminated from rule 0, we are left with the rule 

^R[{x,y) W R2{x,w) Ri{x,y), Si{z,y), not auxi{x, z), S2{z,w). 

The resulting program is HCF and then rule © can be replaced by two rules: 
-^R'i{x,y) ^ Ri{x,y), Si{z,y), not auxi{x,z),S2{z,w), not R'2{x,w), 
choice{{x, z), w). 

R'2{x,w) ^ Ri{x,y), Si(z,y), not auxi{x,z),S2[z,'w), not ^R[{x,y), 

choice {{x, z),w). □ 
4.2 A LAV approach 

The logic programming-based approach proposed in Section l3. II can be seen as- 
similated to the global-as-view (GAV) approach to virtual data integration |21|. 
in the sense that the tables in the solutions are specified as views over the peer's 
schemas. However, a local-as-view (LAV) approach could also be attempted. In 
this case, we also introduce virtual, global versions of Si, 82- The relations in 
the sources have to be defined as views of the virtual relations in a solution, 
actually, through the following specification of a virtual integration system |18| 



View definitions 


label 


source 


Ri{x,y) <- R'i{x,y) 


closed 


Tl 


R2{x,y) ^ R'2{x,y) 


open 


r2 


Si{x,y) ^ S[{x,y) 


clopen 


Sl 


S2{x,y) ^ S'^ix^y) 


clopen 


32 



Here the the original material extensions of relations Sj. The labels 

for the sources are assigned on the basis of the view definitions in the first 
column, the IC (O and the trust relationships; in the latter case, by the fact 
that i?i,i?2 can change, but not Si, 82- More precisely, the label in the first 
row corresponds to the fact that © can be satisfied by deleting tuples from i?i , 
then the contents of the view defined in there must be contained in the original 
relation ri (the material source). The label in the second row indicates that we 
can insert tuples into R2 to satisfy the constraint, and then, the extension of the 
solution contains the original source r2. Since, 81,82 do not change, we declare 
them as both closed and open, i.e. clopen. 

If a query is posed to, say peer P, it has to be first formulated in terms of 
R[ , R'2 , and then it can be peer consistently answered by querying the integration 
system subject to the global IC: Wx\/y'^z3w{R[{x,y) A 8[{z,y) R'2{x,w) A 
S'2{z,w)). A methodology that is similar to the one applied for consistently 



querying virtual data integration systems under LAV can be used. In |7lll) | 
methodologies for open sources are presented, and in |S| the mixed case with 
both open, closed and clopen sources is treated. However, there are differences 
in our P2P scenario; and those methodologies need to be adjusted. 

The methodology presented in for CQA in virtual data integration is 
based on a three-layered answer set programming specification of the repairs 
of the system: a first layer specifies the contents of the global relations in the 
minimal legal instances (to this layer only open and clopen sources contribute), a 
second layer consisting of program denial constraints that prunes the models that 
violate the closure condition for the closed sources; and a third layer specifying 
the minimal repairs of the legal instances [Jj left by the other layers wrt the 
global ICs. For CQA, repairs are allowed to violate the original labels. 

In our P2P scenario, we want, first of all, to consider only the legal instances 
that satisfy the mapping in the table and that, in the case of closed sources 
include the maximum amount of tuples from the sources (the virtual relations 
must be kept as close as possible to their original, material versions). For the kind 
of mappings that we have in the table, this can be achieved by using exactly the 
same kind of specifications presented in in j5j for the mixed case, but considering 
the closed sources as clopen. In doing so, they will contribute to the program 
with both rules that import their contents into the system (maximizing the set 
of tuples in the global relation) and denial program constraints. Now, the trust 
relation also makes a difference. In order for the virtual relations to satisfy the 
original labels, that in their turn capture the trust relationships, the rules that 
repair the chosen legal instances will consider only tuple deletions (insertions) for 
the virtual global relations corresponding to the closed (resp. open) sources. For 
clopen sources the rules can neither add nor delete tuples.^ This methodology 
can handle universal and simple referential DECs (no cycles and single atom 
consequents, conditions that are imposed by the repair layer of the program), 
which covers a broad class of DECs. The DEC in Q does not fall in this class, 
but the repair layer can be easily adjusted in order to generate the solutions for 
peer P. Due to space limitations, the program is given in the appendix. 

4.3 Beyond direct solutions 

It is natural to considers transitive data exchange dependencies. This is a situa- 
tion that arises when, e.g. a peer A, that is being queried, gets data from another 
peer B, who in its turn -and without A possibly knowing- gets data from a third 
peer C to answer A's request. Most likely there won't be any explicit DEC from 
A to C capturing this transitive exchange; and we do not want to derive any. 

In order to attack peer consistent query answering in this more complex sce- 
nario, it becomes necessary to integrate the local solutions, what can be achieved 
by integrating the "local" specification programs. In this case, it is much more 
natural and simpler than extending the definition of solutions for the direct 
case, to define the semantics of a peer's (global) solutions directly as the answer 

® This preference criterion for a subclass of the repairs is similar to the loosely-sound 
semantic for integration of open sources under GAV |2()| . 



sets of the combined programs. Of course, there might be no solutions, what is 
reflected in the absence of stable models for the program. A problematic case 
appears when there are implicit cyclic dependencies |19| . 

Example 4- (example in Section 13.11 continued) Let us consider another peer 
C. The following exchange constraint ^q^c • '^2;Vy(f/(a;, y) Si{x,y)) exists 
from Q to C and (Q, less,C) £ trust, meaning that Q trusts C's data more than 
its own. When P requests data from Q, the latter will request data from C's 
relation U. Now, consider the peer instances: ri = {(a,5)},si = {}, r2 = {}, 
S2 = {(c, e), (c, /)} and u = {(c, 5)}. If we analyze each peer locally, the solution 
for Q would contain the tuple Si{c, b) added; and P would have only one solution, 
corresponding to the original instances, because the DEC is satisfied without 
making any changes. When considering them globally, the tuple that is locally 
added into Q requires tuples to be added and/or deleted into/from P in order 
to satisfy the DEC. The combined program that specifies the global solutions 
consists of rules (gj, ©,0, plus (iCl), CU replace ©, resp.) 



The solutions obtained from the stable models of the program are precisely 
the expected ones: n = {S'2(c, e), 5*2(0, /), [/(c, 6), 5*^ (c, 6), i?2(a, /), i?'i(a, 6)}, 
r2 = {52(c,e),52(c,/),C/(c,6),5;(c,6)}, rg = {^2(0, e), ^2(0, /), C/(c, 6), 51 (c, 6), 
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5 Appendix: 

The following answer set program specifies the solutions for the example in 
Section im following a LAV approach to P2P data exchange (see Section IT^ . 
Assume that the peers have the following instances: ri = {(a, 6)}, si = {(c, b)}, 
r2 = {} and S2 = {(c, e), (c, /)}. Then, the facts of the program are: Ri{a, b), 
Si{c,b), 82(0, e), 82(0, f). The layer that specifies the preferred legal instances 
contains the following rules: 

R[iX,Y,td) ^ Ri{X,Y). 
S[iX,Y,td) ^ S,{X,Y). 
R'2{X,Y,td) ^ R2(X,Y). 
S',{X,Y,td) ^ S2(X,Y). 

^R[{X,Y,td),Ri{X,Y). 

^S[{X,Y,td),Si{X,Y)- 
^S'^iX,Y,td),S2{X,Y). 

The layer that specifies the repairs of the legal instances contains the following 
rules. The annotation constants in the third arguments in the relations are used 
as auxiliary elements in the repairs process The choice operator has been 
unfolded, producing the stable version of the choice program. 



R[{X,Y,t,s) 
R[{X,Y,t,s) 

S[{X,Y,t,,) 

R'2iX,Y,Us) 
R'^{X,Y,tss) 

S'^iX,Y,Us) 

R'l{X, X, fa) 

auxi{X, Z) 
aux2{Z) 

R[iX,YJa)W R'^{X,W,ta) 

chosen{X, Z, W) 



R[{X,Y,td), not R[{X,YJa). 

R[{X,Y,ta). 

R[{X,Y,ta),R[{X,YJa). 

-S[iX,Y,td), not S[{X,YJa). 

'S[{X,Y,ta). 
-S[iX,Y,ta),S[{X,Y,fa). 

R'^{X,YM), not R'^{X,YJa)- 

R'2{X,Y,ta). 

R'^{X,Y,ta),R'2{X,Y,fa). 

-S'^{X,YM). not S'^{X,Y,fa). 

- S'^{X,Y,ta). 

' S2iX, Y, ta), S'2{X, Y, fa). 

R[{X,Y,td), S[{Z,Y,td), not auxi{X,Z), 
not aux2{Z). 
R'2iX,U,td),S'2{Z,U,td). 

- S'^iZ.WM)- 
R[iX,Y,td),S[iZ,Y,td), not auxi{X,Z), 
S'^iZ, W, td),chosen{X, Z, W). 
R[{X,Y,td),S[{Z,Y,td), not auxi{X,Z), 
S2iZ,W,td), not diffchoice{X,Z,W). 



diffchoice{X, Z, W) ^ chosen{X, Z, U), S'^{Z, W, td),U ^ W. 



The following arc the stable models of the program: 

Mi= {Ri{a,b), Si{c,b), S2{c,e), S^icJ), R[{a,b,td), S[{c,b,td), S!,{c,e,td), 
82(0, f ,td), aux2{c), S[{c,b,tss), 5*2(0,6,^55), 52(0, /, tss), R[{a,b,tss), 
dijf choice (a, c, e), chosen{a, c, f), R'2{a, f,ta), Ri^ia, f,tss)} 

M2= {Ri{a,b), -Si(c,6), 52(c,e), -S2(c,/), R[ia,b,td), S[{c,bM), 5^(c,e,t(i), 
S2{c,.f,td), aux2{c), S[{c,b,tss), S2{c,e,tss), ^^(c, /, tss), R[{a,b,fa), 
diffchoice{a, c, e), chosen{a, c, /)} 

M3= {Ri{a,b), Siic,b), 52(c,e), ^2(0,/), R[{a,b,td), S[{c,bM). S'2{c,e,td), 
S2{c,f,td), aux2{c), S[{c,b,tss), 82(0, e,tss), 82(0, f,tss), R[{a,b,tss), 
chosen{a, c, e), diffchoice{a, c, f), i?,2(a, e, fa), R'2{a, e,tss)} 

M4= {Ri{a,b), 8i{c,b),'82{c,e), 82{c,f), R[{a,b,td), S[{c,b,td), 8'2{c,e,td), 
S'2{c,f,td), aux2{c), S[{c,b,tss), 8'2{c,e,tss), S2{c,f,tss), R[{a,b,fa), 
chosen{a, c, e), diffchoice{a, c, /)}, 

which correspond to the following solutions (they can be obtained by selecting 
only the tuples with annotation tgs)- = {8i{c, b), 82(0, e), 82(0, /), R'i{a, b), 
R'2ia,f)}, = {S[{c,b), 5^(c,e), S^(c,/)}, r^3 = {S[{c,b), 5^(c,e), 

5^(c,/), R[{a,b), R'2{a,e)}, r^^ = {S[{c,b), 5^(c,e), 5^(c,/)}. 



